Detection and suppression of returned audio at near-end

ABSTRACT

Audio from a near-end that has been acoustically coupled at the far-end and returned to the near-end unit is detected and suppressed at the near-end of a conference. First and second energy outputs for separate bands are determined for the near-end audio being sent from the near-end unit and for the far-end audio being received at the near-end unit. The near-end unit compares the first and second energy outputs to one another for each of the bands over a time delay range and detects the return of the sent near-end audio in the received far-end audio based on the comparison. The comparison can use a cross-correlation to find an estimated time delay used for further analysis of the near and far-end energies. The near-end unit suppresses any detected return by muting or reducing what far-end audio is output at its loudspeaker.

BACKGROUND

During a conference, at least two communication systems (i.e., anear-end unit and a far-end unit) participate in a call. Typically, theunits will have near-end echo cancellation. For example, a near-end unit10 schematically shown in FIG. 1 has an audio decoder 12, an audio coder14, a loudspeaker 20, and a microphone 40 and communicatively couples toa far end unit 16 using techniques known in the art. During aconference, the audio decoder 12 receives far-end audio, decodes it, andsends the decoded audio to the loudspeaker 20 so the near-endparticipant can listen. In turn, the microphone 40 picks up near-endaudio from the participant, and the audio coder 14 encodes the near-endaudio and sends it to the far-end unit 16. Due to the proximity of theloudspeaker 20 and microphone 40, acoustic coupling (indicated by arrow11) may occur in which far-end audio output by the loudspeaker 20 ispicked up by the microphone 40 and fed back to the far-end unit 16.

To reduce the effects of this acoustic coupling (11), the near-end unit10 has a near-end echo canceller 30 that operates between thedecoder/coder 12/14 and the loudspeaker/microphone 20/40. The near-endecho canceller 30 subtracts the audio emitted from the loudspeaker 20that has been picked up by the microphone 40. The audio coder 14 thentransmits the resulting signal to the far-end unit 16. In this way, thenear-end echo cancellation reduces the acoustic coupling in theloudspeaker-to-microphone acoustic path at the near-end and helps toprevent the far-end participant from hearing his own voice come back tohim as returned echo.

Although near-end echo cancellation may be used at the near-end unit,the far-end unit 16 in some instances may not have a working acousticecho canceller. In this case, the near-end participant will hear hisvoice come back to him due to the acoustic coupling (indicated by arrow17) between the loudspeaker and microphone at the far-end. Therefore,near-end echo cancellation may benefit the person at the far-end, but itdoes nothing to prevent the near-end from hearing near-end audioreturned from the far-end as echo.

The subject matter of the present disclosure is directed to overcoming,or at least reducing the effects of, one or more of the problems setforth above.

SUMMARY

Audio from a near-end that has been acoustically coupled at the far-endand returned to the near-end unit is detected and suppressed at thenear-end of a conference. First energy outputs for separate bands aredetermined for the near-end audio being sent from the near-end unit, andsecond energy outputs for the separate bands are determined for thefar-end audio being received at the near-end unit. The near-end unitcompares the first and second energy outputs for each of the bands toone another over a time delay range and detects the return of the sentnear-end audio in the received far-end audio based on the comparison.The comparison can use a cross-correlation to find an estimated timedelay used for further analysis of the near and far-end energies. Thenear-end unit suppresses any detected return by muting or reducing whatfar-end audio is output at its loudspeaker.

The foregoing summary is not intended to summarize each potentialembodiment or every aspect of the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a near-end echo canceller according to the prior art.

FIG. 2 illustrates an arrangement for far-end echo detection andsuppression according to the present disclosure.

FIG. 3 illustrates a signal processing operation for far-end echodetection and suppression according to the present disclosure.

FIG. 4 diagrammatically illustrates the far-end echo detection andsuppression of an example signal.

FIG. 5 illustrates additional stages of the signal processing operationof FIG. 3.

FIG. 6 illustrates a multipoint bridge unit having a far-end echodetection and suppression module according to the present disclosure.

DETAILED DESCRIPTION

A. Near-End Unit Having Echo Detection and Suppression

A near-end unit 10 schematically shown in FIG. 2 can be used forteleconferencing or videoconferencing. As noted previously, the near-endunit 10 has an audio decoder 12, an audio coder 14, a loudspeaker 20,and a microphone 40 and communicatively couples to a far-end unit 16using techniques known in the art. During a conference, the audiodecoder 12 receives far-end audio, decodes it, and sends the decodedaudio to the loudspeaker 20 so the near-end participant can listen. Inturn, the microphone 40 picks up near-end audio from the participant,and the audio coder 14 encodes the near-end audio and sends it to thefar-end unit 16.

To reduce the effects of the acoustic coupling (indicated by arrow 11)at the near-end, the near-end unit 10 has a near-end echo canceller 30that subtracts the audio emitted from the loudspeaker 20 that has beenpicked up by the near-end microphone 40 through acoustic coupling. Asnoted previously, the near-end echo cancellation 30 reduces the acousticcoupling in the loudspeaker-to-microphone acoustic path at the near-endand helps to prevent the far-end participant from hearing his voice comeback to him as echo.

In addition to these components, however, the near-end unit 10 has amodule 50 for detecting and suppressing near-end audio returned from thefar-end through acoustic coupling (indicated by arrow 17). When thefar-end unit 16 fails to provide sufficient echo cancellation, themodule 50 detects the presence of near-end audio returned in the far-endaudio and prevents or suppresses that returned audio from being relayedthrough the loudspeaker 20 at the near-end.

As shown, the module 50 receives the decoded audio from the audiodecoder 12 before the audio proceeds to the loudspeaker 20 and echocanceller 30. In this arrangement, the module 50 detects returnednear-end audio in the far-end audio decoded by the audio decoder 12. Asnoted herein, the returned audio occurs when near-end audio is picked-upby the microphone 40 at the near-end, is transmitted to the far-end unit16, undergoes loudspeaker-to-microphone acoustic coupling (17) at thefar-end, and is then returned to the near-end's decoder 12 to be outputby the near-end loudspeaker 20. As expected, receiving one's own voicereplayed back can be very distracting for the near-end participant inthe conference. When the module 50 detects such returned audio, thenear-end loudspeaker 20 may be muted or otherwise turned down, therebyeliminating or reducing what returned audio is output by the loudspeaker20 at the near-end.

Although the location of the module 50 is conceptually straightforwardin the near-end unit 10, there are a number of practical challenges atthe near-end to detecting near-end audio that has been returned from thefar-end. For example, the module 50 must be able to handle any timedelay in the far-end audio. This delay can range anywhere from 10's ofmilliseconds to 1 second or more and may change over time during aconference call. In addition, the audio coder and other components (notshown) at the far-end unit 16 may introduce significant non-lineardistortion and noise to the returned audio that the module 50 must dealwith. Further, the returned audio may have wide-ranging amplitude andfrequency responses so that the returned audio's amplitude may be veryweak or very strong at any given time and may alter drastically infrequency from the original signal.

B. Process for Echo Detection and Suppression

To detect and suppress the returned audio in light of the abovechallenges, the module 50 of FIG. 2 performs a signal processingoperation 100 as shown in FIG. 3. In the operation 100, the module 50obtains both the near-end and far-end audio signals (Blocks 110/120) anduses filterbanks to separately filter the obtained signals into a numberof bands (Blocks 112/122). The module 50 samples the output for eachband and each signal at a predetermined interval (Block 114/124) andfinds the energy of the sampled output for each band (Block 116/126).

As shown FIG. 4, for example, the module 50 can obtain the near-end andfar-end audio signals 150/160 at a sample rate of about 48 kHz, and thenear-end and far-end signals are fed to separate filterbanks 152/162that consist of five sub-bands centered at about 400, 800, 1200, 1600,and 2000 Hz. The filterbanks 152/162 can use filters that are cosine andsine modulated versions of a Blackman window of 10-ms in length,producing real and imaginary parts of the sub-band signal every 20-ms.Such a filterbank may be implemented very efficiently because of thecommonality of operations between the different sub-bands. Theseseparate filterbanks 152/162 filter the signals into five bands (1-5).The output of each band (1-5) can be sampled at every 20-ms, and themodule 50 can find the energy of the sampled output for each band (1-5)to produce sampled output signals 155/165.

Continuing with the operation 100 of FIG. 3, the module 50 compares theenergy variations in the near-end and far-end signals as a function oftime in a range of time delay (Block 130). In doing this comparison, themodule 50 cross-correlates the corresponding bands of each of thesignals at different time delays. From this cross-correlation, themodule 50 determines the presence of returned near-end audio in thereceived far-end audio at a given time delay when there is a peak in thecross-correlation values for each of the bands of the two comparedsignals.

As shown in FIG. 4, for example, the operation 100 runs across-correlation window 170 described in more detail below. From theresults of the window 170, the operation 100 determines at which timedelay value 180 high cross-correlation values 185 occur between what issent by the near-end unit at one point in time to what is received fromthe far-end unit at a subsequent point in time. High cross-correlationvalues 185 at a particular time delay value 180 indicates the presenceof returned near-end audio in the far-end audio received at this timedelay.

Returning to FIG. 3, the operation 100 determines from thecross-correlation results whether near-end audio is probably beingreturned in the far-end audio received (Decision 135). If returned audiois not occurring (no for Decision 135), the operation 100 can return torepeating steps of finding energy to determine if audio is currentlybeing returned (i.e., Blocks 110, 120, etc.). However, if enough highcorrelations occur at the same time delay value (yes for Decision 135),the operation 100 declares the presence of returned near-end audio atthat time delay. At this point, the module 50 uses the estimated timedelay value at which the near-end audio is probably being returned fromthe far-end and continues processing to handle the returned audio.

Although the cross-correlation process (Block 130) for detecting thepresence of the returned audio at a time delay is reliable, the processcan be slow due to the integration over time needed to obtain accurateestimates. If the cross-correlation is used alone, some returned audiomay be output at the near-end. Accordingly, the cross-correlationprocess (Block 130) is essentially used to estimate a time delay valuefor potential returned audio. Once this is done, the module 50preferably uses a faster-responding detection scheme that directly usesthe sub-band energy when the cross-correlation process (Block 130) hasdetermined that near-end audio is being returned.

To handle the returned audio in this faster scheme, the operation 100determines the echo return loss (ERL) using a peak energy algorithm whenthe cross-correlation is high enough (Block 200). In general, thisprocess 200 finds the peak energy of the outgoing (near-end) audio ineach band and finds the peak energy of the incoming (far-end) audio ineach band at the estimated time delay. If the near-end energy is beingreturned in the received far-end audio at the time delay, then the peakenergy for one or more of the bands will reflect this.

After finding the echo return loss, the operation 100 then implements asub-band doubletalk detector using the estimated time delay of thereturned audio (Block 210). At this point, the operation 100 knows theestimated time delay at which the near-end audio is being returned fromthe far-end, and the operation 100 knows the energy that comes back atthat estimated time delay. Therefore, the doubletalk detection (Block210) can determine whether the energy coming in from the far-end audiois primarily or solely due to the near-end audio (speech) being returnedas echo or whether someone at the far-end is speaking over some portionof possible echo.

In essence, if the doubletalk detection (Block 210) determines that peakenergy in any particular sub-band of the far-end audio at the time delayis greater than the peak energy of the same band in the near-end audio,then the doubletalk detection (Block 210) can determine that far-endspeech is occurring. Therefore, the doubletalk detection will notsuppress or mute the far-end audio being output at the near-end unit 10.Ultimately, if doubletalk is not occurring, then the operation 100 hasthe module 50 suppress or mute the far-end audio to prevent returnednear-end audio from being output at the near-end. Again, the near-endloudspeaker 20 may be muted or otherwise turned down to suppress thereturned audio.

1. Details of Cross-Correlation Process

Given the above-description of the process for determining the presenceof returned audio and its time delay, discussion now turns to particularexamples of the cross-correlation process for estimating a time delay atwhich near-end audio is potentially being returned. In oneimplementation, the module 50 in FIG. 3 can use a moving averageoperation to determine the cross-correlation between the near andfar-end sub-band energies. For two sequences in time x(t),y(t), forexample, a definition of cross-correlation for a time delay or lag k ata time index n is provided by:

$\begin{matrix}{{{{Corr}\lbrack n\rbrack}\lbrack k\rbrack} = {\left( {\sum\limits_{t = n}^{n - N}{{x(t)}{y\left( {t - k} \right)}}} \right)/\left( {\sqrt{\sum\limits_{t = n}^{n - N}{x^{2}(t)}}\sqrt{\sum\limits_{t = n}^{n - N}{y^{2}(t)}}} \right)}} & (1)\end{matrix}$

Here, the first sequence in time x(t) can correspond to the sampledenergy of a given band for the near-end audio, while the second sequencein time y(t) can correspond to the sampled energy of the given band forthe far-end audio. In equation (1), if the value of Corr[n][k] is closeto 1, then the time sequences x(t),y(t) are very similar in shape forthe time index n and time delay or lag k. The summation over N terms inequation (1) may be viewed as a moving average operation.

As an alternative to the moving average of equation (1), the module 50can reduce computation by replacing the moving average operation with anInfinite Impulse Response (IIR) filter. In this implementation, the IIRfilter can be provided as:Corr[n][k]=NumCross(n)/(√{square root over (DemX(n))}√{square root over(DenY(n))})  (2)

where

NumCross(n)=α·NumCross(n−1)+(1−α)·x(n)y(n−k)

DenX(n)=α·DenX(n−1)+(1−α)·x²(n)

DenY(n)=α·DenY(n−1)+(1−α)·y²(n)

When used, equation (2) can find the cross-correlation in time betweenthe near-end and far-end bandpass energy time sequences (energy found at20-ms intervals). When the cross-correlation between the near andfar-end bandpass energy time sequences is high, the module 50 estimatesthis to be a reliable estimation that near-end audio is being returnedat the corresponding time delay or lag k and uses this estimation toperform further processing.

As shown in FIG. 4, for example, the cross-correlation can be doneefficiently using the Infinite Impulse Response (IIR) window 170 (asdescribed above) with a time constant (α) of 0.8 seconds. The time delayor lag k is preferably constrained in the range of 0 to about 3 seconds(i.e., 2.56 seconds). The presence of returned audio will cause a peakin the cross-correlation value 185 at some time lag (k) 180 for each ofthe sub-bands. If the cross-correlation peak value 185 is high for allthe sub-bands at a similar cross-correlation time lag (k) 180 for all ofthe sub-bands, then there is a high probability that near-end audio isbeing returned at that time index.

2. Details of Peak Energy and Double Talk Detection

Given the above-description of the cross-correlation process forestimating a time delay at which near-end audio is potentially beingreturned, discussion now turns to the faster-responding detection schemethat determines the echo return loss (ERL) (200) using a peak energyalgorithm and then implements a sub-band doubletalk detector (210) usingthe estimated time delay. As shown in more detail in FIG. 5, forexample, when the cross-correlation between the near and far-ends ishigh enough at a particular time delay, the ERL process (200) estimatesthe ratio of the energies of the far-end energy to the near-end energy(i.e., Echo Return Loss (ERL)) for each band (Block 202). The ERLprocess (200) then multiples the near-end energy by the ERL value (Block204). Because near-end audio is believed to be returning, the expectedresult should equal the far-end energy. If it does not, then the process(200) may terminate and return to estimating a probable time delay ofreturned audio. Otherwise, the operation 100 continues processing byimplementing the sub-band doubletalk detector (210).

The doubletalk detector (210) then delays the near-end energy versustime so that it matches the delay of the far-end energy (Block 212).Then, the doubletalk detector (210) selects a first of the bands (Block216), multiplies the ERL times the near-end energy for that band (Block218), and compares the far-end energy to the ERL times the near-endenergy (Decision 220).

If the comparison shows that the far-end energy is at or below twice itsexpected value (yes-Decision 220), then the band count for probablereturned audio is incremented (Block 222). If the far end energy for theband is more than twice its expected value, then the band is not addedto the band count. Either way, the doubletalk detector (210) determinesif more bands remain to be analyzed (Decision 226) and operatesaccordingly. After comparing the far-end energy to the ERL times thenear-end energy band-by-band (Block 216 through 226), the doubletalkdetector (210) determines whether enough bands of the far-end energy areat or below twice their expected value (Decision 228).

If enough bands of the far-end energy are more than (i.e., not at orbelow) twice their expected value (no at Decision 228), then thedoubletalk detector (210) does not declare that returned audio is beingreceived. Instead, this may indicate that a participant at the far-endis talking while the near-end is silent or is producing much lessenergy. Therefore, the audio being output by the near-end loudspeaker(20; FIG. 2) is not muted or reduced.

If enough bands of the far-end energy are at or below twice theirexpected value (yes at Decision 228), then the doubletalk detector (210)declares the presence of returned near-end audio as echo (Block 230).The total number of bands that must show that the near-end energy timesthe ERL for that band is at or below the expected value to warrantdeclaring returned audio may depend on the implementation. In general,however, if a majority or more than half of the bands show the requiredresult, then the doubletalk detector (210) can declare that returnedaudio is being received.

At this point, the module (50) of the near-end unit (10) in FIG. 2 cansuppress the returned audio by muting or turning down the audio outputby the near-end loudspeaker (20). The actual amount of time for mutingor suppressing the audio may depend on the implementation and may lastfor the amount of time that returned audio is detected or may last forsome predetermined amount of time. For example, the output audio can besuppressed at the near-end for a predetermined time interval, untildetection of the returned audio ceases, or until speech having amagnitude greater than the returned is detected in the far-end audioreceived.

As the operation 100 of FIGS. 3-5 continues during a call, the module 50of FIG. 2 can produce half-duplex suppression of the returned audio atthe near-end unit 10 so that the near-end participant will not hear hervoice relayed back to her by the far-end unit's failure to cancel outany acoustic coupling that may be occurring at the far-end. However, ifthe far-end has a working echo canceller, the operation 100 is expectedto not detect returned audio falsely. In general, the module's abilityto detect returned near-end audio is expected to operate under variousamplitudes, phase-variations, and changes in delay over time, as well asaudio codec distortions. Based on what they hear, neither near-end norfar-end participant may know whether or not their own echo canceller isworking, making automated operation of the module 50 beneficial.

The module 50 for detecting and suppressing the returned audio can beused with Polycom's HDX9004 system. Using a sampling rate is 48 kHz, forexample, the module 50 in the HDX9004 system can process 20-ms blocks offar-end and near-end audio at a time and can declare the presence orabsence of returned near-end audio in the far-end audio block using alow computing cost of about 1.37 million instructions per second on aTrimedia PNX1700 chip used in the system. In addition, the module 50 candetect the very first occurrence of returned audio and reliably estimatea time delay for it within about 2 to 3 seconds of a participant'sspeech. At that point, the module 50 can mute the output audio in about0.2 seconds of speech.

C. Multipoint Bridge Unit

Previous arrangements focused on using the module 50 in a conferencingunit (e.g., near-end unit). Yet, far-end echo is a common problem inmulti-way calls having multiple sites jointly connected in a call. Themore sites involved increase the chances that one site may lack an echocanceller to reduce the effects of acoustic coupling at the site.Therefore, the disclosed techniques for detecting and suppressingreturned audio can also be used in a multipoint bridge unit that handlescalls for multiple units. Suitable examples of multipoint bridge unitsinclude the MGC-25, MGC-F50 (RediConvene™), or MGC+100 (RediConvene™),which are unified multipoint conferencing bridges with integratedconference management available from Polycom, Inc.—the assignee of thepresent disclosure.

As schematically shown in FIG. 6, a multipoint bridge unit 300 has anumber of input/output connections 310A-N to various endpoints 10A-Nthat it handles for conference calls. These connections 310A-N couple toa common interface 320, which in turn connects to various audio modules330A-N for each of the connected endpoints. As is typical, the audiomodules 330A-N handle the audio for the endpoints 10A-N in theconference calls. For example, one audio module 310A can receive theinput (near-end) audio from a dedicated endpoint 10A for sending on toone or more other endpoints in a conference call. Likewise, the sameaudio module 310A can send output (far-end) audio from the otherendpoints to the dedicated endpoint 10A. To handle the sending andreceiving of audio between the endpoints, each module 330 can have audioports 340 and/or broadcast ports 345 for the endpoints in the call,depending on how the calls are set up. For example, an audio port 340can be assigned to one of the endpoints in the call to which theendpoint dedicated to the module 330 is participating in, and the port340 can be used to send and receive audio with that endpoint. On theother hand, the broadcast port 345 can be assigned to one of theendpoints in the call that is only receiving audio.

In addition to the conventional components described above, the unit 300has a module 350 for detecting and suppressing returned audio. When oneof the particular endpoints 10 fails to provide sufficient echocancellation (e.g., it lacks an echo canceller), the module 350 detectsthe presence of returned audio (echo) from that endpoint and prevents orsuppresses that returned audio from being relayed to the otherendpoints. Therefore, this module 350 can use the same processesdiscussed previously with respect to an endpoint unit.

Digital electronic circuitry, computer hardware, firmware, software, orany combination thereof can implement the techniques of the presentdisclosure, and a computer program tangibly embodied in amachine-readable medium for execution by a programmable processor canalso implement the disclosed techniques so that a programmable processorcan execute a program of instructions to perform functions of thedisclosed techniques by operating on input data and generating output.Suitable processors include, by way of example, both general and specialpurpose microprocessors. Generally, a processor will receiveinstructions and data from a read-only memory and/or a random accessmemory. Accordingly, the near-end conferencing unit 10 of FIG. 2 and themultipoint bridge unit 300 of FIG. 5 can each have a processor (notshown) to execute instructions and handle data for the module 50/350.

Generally, a computer includes one or more mass storage devices (e.g.,magnetic disks, internal hard disks, removable disks, magneto-opticaldisks, optical disks, etc.) for storing data files. Storage devicessuitable for tangibly embodying computer program instructions and datainclude all forms of non-volatile memory, including by way of example,semiconductor memory devices (e.g., EPROM, EEPROM, and flash memorydevices), magnetic disks (e.g., internal hard disks and removabledisks), magneto-optical disks, CD-ROM disks, and other computer-readablemedia. Any of the foregoing can be supplemented by or incorporated intoapplication-specific integrated circuits.

The foregoing description of preferred and other embodiments is notintended to limit or restrict the scope or applicability of theinventive concepts conceived of by the Applicants. In exchange fordisclosing the inventive concepts contained herein, the Applicantsdesire all patent rights afforded by the appended claims. Therefore, itis intended that the appended claims include all modifications andalterations to the full extent that they come within the scope of thefollowing claims or the equivalents thereof.

What is claimed is:
 1. A method of suppressing near-end audioacoustically coupled at a far-end and returned to the near-end in aconference, the method comprising: determining first energy output ofnear-end audio sent from a near-end conferencing unit to a far-endconferencing unit by filtering the near-end audio into the first bandsand determining first energy outputs of each of the first bands;determining second energy output of far-end audio sent from the far-endconferencing unit to the near-end conferencing unit by filtering thefar-end audio into the second bands and determining second energyoutputs of each of the second bands; detecting a return of the near-endaudio in the far-end audio by cross-correlating variations in the firstand second energy outputs for each of the bands to one another atincrements of time delay over a time delay range, determining the timedelay having a greatest cross-correlation for all of the bands,estimating for each of the bands a ratio of the far-end energy to thenear-end energy, determining a result by multiplying the near-end energyby the ratio, and determining whether the result is equal to the far-endenergy, wherein the detected return is caused by the near-end audio sentfrom the near-end conferencing unit to the far-end conferencing unitbeing acoustically coupled at the far-end conferencing unit and returnedas part of the far-audio sent from the far-end conferencing unit; andsuppressing the detected return from output at the near-end conferencingunit.
 2. The method of claim 1, wherein the first and second bands eachcenter at 400, 800, 1200, 1600, and 2000 Hz.
 3. The method of claim 1,wherein comparing the first and second energy outputs to one anotherover the time delay range comprises comparing the first and secondenergy outputs for each of the bands to one another over the time delayrange.
 4. The method of claim 1, wherein detecting the return comprises:offsetting the near-end energy versus time to match the far-end energy;comparing the far-end energy outputs to the offset near-end energyoutputs times the ratio for each of the bands; and determining a countof how many of the bands have near-end energy outputs multiplied timesthe ratio that is at least less than or equal to an expected value. 5.The method of claim 4, wherein detecting the return comprises declaringa presence of the return when the determined count is greater than athreshold.
 6. The method of claim 1, wherein the time delay range isfrom 0 to about 3-seconds.
 7. The method of claim 1, wherein determiningthe first and second energy outputs is done at predetermined sampletimes.
 8. The method of claim 1, wherein suppressing the detected returnfrom output at the near-end conferencing unit comprises muting output ofthe far-end audio at a near-end loudspeaker.
 9. The method of claim 1,wherein suppressing the detected return at the near-end comprisesreducing output of the far-end audio at a near-end loudspeaker.
 10. Themethod of claim 1, wherein the detected return is suppressed at thenear-end conferencing unit for a predetermined time interval, untildetection of the return ceases, or until speech having a magnitudegreater than the return is detected in the far-end audio.
 11. Anon-transitory machine-readable medium having program instructionsstored thereon for causing a programmable control device to perform amethod of suppressing near-end audio acoustically coupled at a far-endand returned to the near-end in a conference, the method comprising:determining first energy output of near-end audio sent from a near-endconferencing unit to a far-end conferencing unit by filtering thenear-end audio into first bands and determining first energy outputs ofeach of the first bands; determining second energy output of far-endaudio sent from the far-end conferencing unit to the near-endconferencing unit by filtering the far-end audio into second bands anddetermining second energy outputs of each of the second bands; detectinga return of the near-end audio in the far-end audio by cross-correlatingvariations in the first and second energy outputs for each of the bandsto one another at increments of time delay over a time delay range,determining the time delay having a greatest cross-correlation for allof the bands, estimating for each of the bands a ratio of the far-endenergy to the near-end energy, determining a result by multiplying thenear-end energy by the ratio, and determining whether the result isequal to the far-end energy, wherein the detected return is caused bythe near-end audio sent from the near-end conferencing unit to thefar-end conferencing unit being acoustically coupled at the far-endconferencing unit and returned as part of the far-audio sent from thefar-end conferencing unit; and suppressing the detected return fromoutput at the near-end conferencing unit.
 12. A method of suppressingnear-end audio acoustically coupled at a far-end and returned to thenear-end in a conference, the method comprising: sending near-end audiofrom a near-end conferencing unit to a far-end conferencing unit;determining first energy outputs of first bands of the near-end audio atthe near-end conferencing unit; receiving far-end audio from the far-endconferencing unit at the near-end conferencing unit; determining secondenergy outputs of each of second bands of the far-end audio received atthe near-end conferencing unit; detecting a return of the sent near-endaudio in the received far-end audio by cross-correlation variations inthe first and second energy outputs for each of the bands to one anotherat increments of time delay over a time delay range and determining thetime delay having a greatest cross-correlation for all of the bands, thedetected return being caused by the near-end audio sent from thenear-end conferencing unit to the far-end conferencing unit beingacoustically coupled at the far-end conferencing unit and returned aspart of the far-end audio received at the near-end conferencing unit;and suppressing the detected return from output at the near-endconferencing unit.
 13. The method of claim 12, wherein determining thefirst energy outputs of first bands of near-end audio comprises:obtaining the near-end audio at the near-end conferencing unit;filtering the near-end audio into the first bands at the near-endconferencing unit; and determining the first energy outputs of each ofthe first bands.
 14. The method of claim 12, wherein determining thesecond energy outputs of second bands of far-end audio comprises:obtaining the far-end audio received from the far-end conferencing unitat the near-end conferencing unit; filtering the far-end audio into thesecond bands at the near-end conferencing unit; and determining thesecond energy outputs of each of the second bands.
 15. A conferencingunit, comprising: a coder unit receiving near-end audio from amicrophone and sending near-end audio signals to a far-end conferencingunit; a decoder unit receiving a far-end audio signal from the far-endconferencing unit and sending far-end audio for output at a loudspeaker;and a processor unit operatively coupled to the coder and decoder unitsand configured to: filter the near-end audio into first bands, determinefirst energy outputs of each of the first bands, filter the far-endaudio into second bands, determine second energy outputs of each of thesecond bands, detect return caused by the near-end audio sent from thenear-end conferencing unit to the far-end conferencing unit beingacoustically coupled at the far-end conferencing unit and returnedwithin the far-end audio received at the near-end conferencing unit,wherein to detect the return, the processor is configuredto—cross-correlate variations in the first and second energy outputs toone another for each of the bands at increments of time delay over atime delay range, and determine the time delay having a greatestcross-correlation for all of the bands, and suppress sending thedetected return to a near-end loudspeaker.
 16. The unit of claim 15,wherein the conferencing unit is a near-end conferencing unitcommunicatively coupled to the far-end conferencing network via anetwork.
 17. The unit of claim 15, wherein the conferencing unit is amultipoint conferencing bridge unit communicatively coupled to anear-conferencing unit and the far-end conferencing network via anetwork.
 18. The unit of claim 15, wherein to detect the return, theprocessor is configured to: estimate for each of the bands a ratio ofthe far-end energy to the near-end energy; determine a result bymultiplying the near-energy by the ratio; and determine whether theresult is equal to the far-end energy.
 19. The unit of claim 18, whereinto detect the return, the processor is configured to: offset thenear-end energy versus time to match the far-end energy; compare thefar-end energy outputs to the offset near-end energy outputs times theratio for each of the bands; and determine a count of how many of thebands have near-end energy outputs multiplied times the ratio that is atleast less than or equal to an expected value.
 20. The unit of claim 19,wherein to detect the return, the processor is configured to declare apresence of the return when the determined count is greater than athreshold.
 21. The unit of claim 15, wherein suppressing the detectedreturn from output at the near-end conferencing unit comprises: mutingoutput of the far-end audio at a near-end loudspeaker; or reducingoutput of the far-end audio at the near-end loudspeaker.
 22. The unit ofclaim 15, wherein the detected return is suppressed at the near-endconferencing unit for a predetermined time interval, until detection ofthe return ceases, or until speech having a magnitude greater than thereturn is detected in the far-end audio.
 23. The method of claim 12,wherein detecting the return comprises: estimating for each of the bandsa ratio of the far-end energy to the near-end energy; determining aresult by multiplying the near-energy by the ratio; and determiningwhether the result is equal to the far-end energy.
 24. The method ofclaim 23, wherein detecting the return comprises: offsetting thenear-end energy versus time to match the far-end energy; comparing thefar-end energy outputs to the offset near-end energy outputs times theratio for each of the bands; and determining a count of how many of thebands have near-end energy outputs multiplied times the ratio that is atleast less than or equal to an expected value.
 25. The method of claim24, wherein detecting the return comprises declaring a presence of thereturn when the determined count is greater than a threshold.
 26. Themethod of claim 12, wherein suppressing the detected return from outputat the near-end conferencing unit comprises: muting output of thefar-end audio at a near-end loudspeaker; or reducing output of thefar-end audio at the near-end loudspeaker.
 27. The method of claim 12,wherein the detected return is suppressed at the near-end conferencingunit for a predetermined time interval, until detection of the returnceases, or until speech having a magnitude greater than the return isdetected in the far-end audio.
 28. The medium of claim 11, whereincomparing the first and second energy outputs to one another over thetime delay range comprises comparing the first and second energy outputsfor each of the bands to one another over the time delay range.
 29. Themedium of claim 11, wherein detecting the return comprises: offsettingthe near-end energy versus time to match the far-end energy; comparingthe far-end energy outputs to the offset near-end energy outputs timesthe ratio for each of the bands; and determining a count of how many ofthe bands have near-end energy outputs multiplied times the ratio thatis at least less than or equal to an expected value.
 30. The medium ofclaim 29, wherein detecting the return comprises declaring a presence ofthe return when the determined count is greater than a threshold. 31.The medium of claim 11, wherein suppressing the detected return fromoutput at the near-end conferencing unit comprises: muting output of thefar-end audio at a near-end loudspeaker; or reducing output of thefar-end audio at the near-end loudspeaker.
 32. The medium of claim 11,wherein the detected return is suppressed at the near-end conferencingunit for a predetermined time interval, until detection of the returnceases, or until speech having a magnitude greater than the return isdetected in the far-end audio.